🏠

Chapter 17: Tree Basics and Terminology

Understanding Trees Through a Real Problem

Understanding Trees Through a Real Problem

Before diving into abstract definitions, let's encounter a tree structure in its natural habitat. Consider this real-world scenario: you're building a company organization chart system that needs to answer questions like "How many people report to this manager?" or "What's the longest chain of command?"

Here's actual organizational data:

CEO
β”œβ”€β”€ VP Engineering
β”‚   β”œβ”€β”€ Engineering Manager A
β”‚   β”‚   β”œβ”€β”€ Senior Engineer 1
β”‚   β”‚   └── Senior Engineer 2
β”‚   └── Engineering Manager B
β”‚       └── Junior Engineer 1
└── VP Sales
    β”œβ”€β”€ Sales Manager
    β”‚   β”œβ”€β”€ Sales Rep 1
    β”‚   └── Sales Rep 2
    └── Sales Rep 3

Let's try to represent this with what we know so farβ€”nested dictionaries:

# Attempt 1: Nested dictionaries
org_chart = {
    "CEO": {
        "VP Engineering": {
            "Engineering Manager A": {
                "Senior Engineer 1": {},
                "Senior Engineer 2": {}
            },
            "Engineering Manager B": {
                "Junior Engineer 1": {}
            }
        },
        "VP Sales": {
            "Sales Manager": {
                "Sales Rep 1": {},
                "Sales Rep 2": {}
            },
            "Sales Rep 3": {}
        }
    }
}

# Try to count total employees
def count_employees(org):
    """Count all employees in the organization"""
    count = 1  # Count this person
    for subordinate, their_reports in org.items():
        count += count_employees(their_reports)
    return count

print(f"Total employees: {count_employees(org_chart)}")

Python Output:

Total employees: 11

This works! But let's try something more complexβ€”finding the longest reporting chain:

def max_reporting_depth(org):
    """Find the longest chain of command"""
    if not org:  # No subordinates
        return 0

    max_depth = 0
    for subordinate, their_reports in org.items():
        depth = 1 + max_reporting_depth(their_reports)
        max_depth = max(max_depth, depth)

    return max_depth

print(f"Longest reporting chain: {max_reporting_depth(org_chart)}")

Python Output:

Longest reporting chain: 4

Now let's try to find a specific person:

def find_person(org, target_name):
    """Find if a person exists in the organization"""
    for name, reports in org.items():
        if name == target_name:
            return True
        if find_person(reports, target_name):
            return True
    return False

print(f"Is 'Senior Engineer 1' in org? {find_person(org_chart, 'Senior Engineer 1')}")
print(f"Is 'CEO' in org? {find_person(org_chart, 'CEO')}")

Python Output:

Is 'Senior Engineer 1' in org? True
Is 'CEO' in org? False

Diagnostic Analysis: Understanding the Failure

Waitβ€”the CEO is definitely in the organization! Let's trace what happened:

The complete execution trace:

find_person({"CEO": {...}}, "CEO")
  β†’ Loop through items: name="CEO", reports={...}
  β†’ Is "CEO" == "CEO"? YES!
  β†’ But wait... we're checking the KEY, not the root
  β†’ The function signature assumes the root is implicit
  β†’ We never check if the root itself matches

Root cause identified: Our dictionary representation conflates the container with the content. The CEO is the key of the outer dictionary, but our function only checks keys of subordinates.

Why the current approach is problematic: 1. The root node is treated differently from all other nodes 2. We can't easily store additional data about each person (title, salary, hire date) 3. The structure is awkward: {"name": {"subordinate_name": {...}}} 4. We're mixing the person's identity (name) with their relationships (reports)

What we need: A clear separation between: - Node data (the person's information) - Node relationships (who reports to whom) - Tree structure (how nodes connect)

This is exactly what tree data structures provide. Let's build one properly.

Tree Terminology: The Language of Hierarchies

Tree Terminology: The Language of Hierarchies

Before we fix our organization chart, we need to establish precise vocabulary. Trees have specific terminology that makes discussing them unambiguous.

Core Concepts

A tree is a hierarchical data structure consisting of nodes connected by edges, with these properties: - One root node at the top (no parent) - Every other node has exactly one parent - Nodes can have zero or more children - No cycles (you can't follow edges and return to where you started)

Let's visualize this with our organization:

         [CEO]                    ← Root (no parent)
           |
    β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”
    |             |
[VP Eng]      [VP Sales]          ← Children of CEO, Parents of their reports
    |             |
β”Œβ”€β”€β”€β”΄β”€β”€β”€β”     β”Œβ”€β”€β”€β”΄β”€β”€β”€β”
|       |     |       |
[EM A] [EM B] [SM]  [SR3]         ← Some are parents, some are leaves
  |       |     |
β”Œβ”€β”΄β”€β”   [JE1] β”Œβ”΄β”
|   |         |  |
[SE1][SE2]  [SR1][SR2]            ← Leaves (no children)

Essential Terminology

Node: A single element in the tree containing data - In our example: each person is a node

Edge: A connection between a parent and child - The lines connecting people in the org chart

Root: The topmost node with no parent - The CEO in our example - Every tree has exactly one root

Parent: A node with children below it - VP Engineering is the parent of both Engineering Managers

Child: A node directly connected below another node - Engineering Manager A is a child of VP Engineering

Siblings: Nodes that share the same parent - Engineering Manager A and Engineering Manager B are siblings

Leaf (or Terminal Node): A node with no children - Senior Engineer 1, Senior Engineer 2, Junior Engineer 1, etc. - Also called "external nodes"

Internal Node: A node with at least one child - CEO, VP Engineering, Engineering Manager A, etc. - All non-leaf nodes

Ancestor: Any node on the path from a node up to the root - For Senior Engineer 1: ancestors are Engineering Manager A, VP Engineering, and CEO

Descendant: Any node on the path from a node down to leaves - For VP Engineering: all engineers and managers under them

Subtree: A node and all its descendants - The entire Engineering department is a subtree rooted at VP Engineering

Depth (or Level): Distance from the root - CEO is at depth 0 - VPs are at depth 1 - Managers are at depth 2 - Engineers/Reps are at depth 3

Height of a node: Longest path from that node down to a leaf - CEO has height 3 (CEO β†’ VP β†’ Manager β†’ Engineer) - Engineering Manager B has height 1 (Manager β†’ Engineer) - All leaves have height 0

Height of the tree: Height of the root node - Our org chart has height 3

Let's verify these concepts with code:

# Let's manually trace depth and height for our org chart

# Depth (distance from root):
depths = {
    "CEO": 0,
    "VP Engineering": 1,
    "VP Sales": 1,
    "Engineering Manager A": 2,
    "Engineering Manager B": 2,
    "Sales Manager": 2,
    "Sales Rep 3": 2,
    "Senior Engineer 1": 3,
    "Senior Engineer 2": 3,
    "Junior Engineer 1": 3,
    "Sales Rep 1": 3,
    "Sales Rep 2": 3
}

# Height (longest path to any leaf):
heights = {
    "CEO": 3,                    # CEO β†’ VP β†’ Manager β†’ Engineer
    "VP Engineering": 2,         # VP β†’ Manager β†’ Engineer
    "VP Sales": 2,               # VP β†’ Manager β†’ Rep
    "Engineering Manager A": 1,  # Manager β†’ Engineer
    "Engineering Manager B": 1,  # Manager β†’ Engineer
    "Sales Manager": 1,          # Manager β†’ Rep
    "Sales Rep 3": 0,            # Leaf
    "Senior Engineer 1": 0,      # Leaf
    "Senior Engineer 2": 0,      # Leaf
    "Junior Engineer 1": 0,      # Leaf
    "Sales Rep 1": 0,            # Leaf
    "Sales Rep 2": 0             # Leaf
}

# Verify: depth + height should vary, but max(depth + height) = tree height
for person in ["CEO", "VP Engineering", "Senior Engineer 1"]:
    d = depths[person]
    h = heights[person]
    print(f"{person:25} depth={d}, height={h}, sum={d+h}")

Python Output:

CEO                       depth=0, height=3, sum=3
VP Engineering            depth=1, height=2, sum=3
Senior Engineer 1         depth=3, height=0, sum=3

Notice that depth + height equals the tree height (3) for nodes on the longest path from root to leaf. This is a useful property for validation.

Binary Trees vs General Trees

Binary Trees vs General Trees

Trees come in two fundamental varieties, each suited to different problems.

General Trees (N-ary Trees)

A general tree allows each node to have any number of children (0, 1, 2, 3, ..., n).

Our organization chart is a general tree: - CEO has 2 children (VPs) - VP Engineering has 2 children (Managers) - Engineering Manager A has 2 children (Engineers) - Sales Manager has 2 children (Reps) - VP Sales has 2 children (Manager and Rep)

Use cases for general trees: - File systems (folders can contain any number of files/subfolders) - Organization charts (managers can have any number of reports) - XML/HTML DOM (elements can have any number of children) - Category hierarchies (categories can have any number of subcategories)

Binary Trees

A binary tree restricts each node to at most 2 children, conventionally called left and right.

         [5]
        /   \
      [3]   [8]
     /  \   /  \
   [1] [4][7] [9]

Key properties: - Each node has 0, 1, or 2 children - Children are distinguished as "left" and "right" (order matters) - Even if a node has only one child, we specify whether it's left or right

Use cases for binary trees: - Binary search trees (efficient searching/sorting) - Expression trees (mathematical expressions) - Huffman coding trees (data compression) - Decision trees (machine learning)

Structural Differences

Let's see how the same data looks in both structures:

# General tree: representing a simple hierarchy
# Each node can have any number of children

general_tree_example = {
    "value": "Root",
    "children": [
        {
            "value": "Child 1",
            "children": [
                {"value": "Grandchild 1", "children": []},
                {"value": "Grandchild 2", "children": []},
                {"value": "Grandchild 3", "children": []}  # 3 children!
            ]
        },
        {
            "value": "Child 2",
            "children": [
                {"value": "Grandchild 4", "children": []}
            ]
        },
        {
            "value": "Child 3",
            "children": []
        }
    ]
}

# Binary tree: same data forced into binary structure
# Each node has at most 2 children (left and right)

binary_tree_example = {
    "value": "Root",
    "left": {
        "value": "Child 1",
        "left": {
            "value": "Grandchild 1",
            "left": None,
            "right": None
        },
        "right": {
            "value": "Grandchild 2",
            "left": None,
            "right": {
                "value": "Grandchild 3",  # Had to chain this
                "left": None,
                "right": None
            }
        }
    },
    "right": {
        "value": "Child 2",
        "left": {
            "value": "Grandchild 4",
            "left": None,
            "right": None
        },
        "right": {
            "value": "Child 3",
            "left": None,
            "right": None
        }
    }
}

print("General tree allows 3 grandchildren under Child 1")
print("Binary tree must chain or restructure them")

Python Output:

General tree allows 3 grandchildren under Child 1
Binary tree must chain or restructure them

When to Choose Each Type

Choose general trees when: - The number of children varies naturally (file systems, org charts) - You're modeling real-world hierarchies - Flexibility is more important than structure

Choose binary trees when: - You need efficient searching (binary search trees) - You're implementing specific algorithms (heaps, expression evaluation) - The two-child constraint provides algorithmic advantages

For the rest of this chapter, we'll focus on general trees since they're more intuitive for learning recursion. Binary trees get special attention in later chapters due to their algorithmic importance.

Tree Representation in Python: Building a Proper Node Class

Tree Representation in Python: Building a Proper Node Class

Now let's fix our organization chart by building a proper tree structure. We'll create a TreeNode class that cleanly separates data from structure.

Iteration 1: Basic Node Structure

class TreeNode:
    """A node in a general tree (n-ary tree)"""

    def __init__(self, value):
        self.value = value      # The data stored in this node
        self.children = []      # List of child nodes

    def add_child(self, child_node):
        """Add a child to this node"""
        self.children.append(child_node)

    def __repr__(self):
        """String representation for debugging"""
        return f"TreeNode({self.value})"

# Build our organization chart properly
ceo = TreeNode("CEO")
vp_eng = TreeNode("VP Engineering")
vp_sales = TreeNode("VP Sales")

ceo.add_child(vp_eng)
ceo.add_child(vp_sales)

eng_mgr_a = TreeNode("Engineering Manager A")
eng_mgr_b = TreeNode("Engineering Manager B")

vp_eng.add_child(eng_mgr_a)
vp_eng.add_child(eng_mgr_b)

se1 = TreeNode("Senior Engineer 1")
se2 = TreeNode("Senior Engineer 2")

eng_mgr_a.add_child(se1)
eng_mgr_a.add_child(se2)

je1 = TreeNode("Junior Engineer 1")
eng_mgr_b.add_child(je1)

sales_mgr = TreeNode("Sales Manager")
sr3 = TreeNode("Sales Rep 3")

vp_sales.add_child(sales_mgr)
vp_sales.add_child(sr3)

sr1 = TreeNode("Sales Rep 1")
sr2 = TreeNode("Sales Rep 2")

sales_mgr.add_child(sr1)
sales_mgr.add_child(sr2)

print(f"Root: {ceo}")
print(f"Root's children: {ceo.children}")
print(f"VP Engineering's children: {vp_eng.children}")

Python Output:

Root: TreeNode(CEO)
Root's children: [TreeNode(VP Engineering), TreeNode(VP Sales)]
VP Engineering's children: [TreeNode(Engineering Manager A), TreeNode(Engineering Manager B)]

Now let's rewrite our functions to work with this structure:

def count_employees(node):
    """Count all employees in the tree rooted at node"""
    count = 1  # Count this node
    for child in node.children:
        count += count_employees(child)
    return count

def find_person(node, target_name):
    """Find if a person exists in the tree"""
    if node.value == target_name:  # Check the root!
        return True

    for child in node.children:
        if find_person(child, target_name):
            return True

    return False

def max_depth(node):
    """Find the height of the tree (longest path to leaf)"""
    if not node.children:  # Leaf node
        return 0

    max_child_depth = 0
    for child in node.children:
        child_depth = max_depth(child)
        max_child_depth = max(max_child_depth, child_depth)

    return 1 + max_child_depth

print(f"Total employees: {count_employees(ceo)}")
print(f"Is 'CEO' in org? {find_person(ceo, 'CEO')}")
print(f"Is 'Senior Engineer 1' in org? {find_person(ceo, 'Senior Engineer 1')}")
print(f"Tree height: {max_depth(ceo)}")

Python Output:

Total employees: 11
Is 'CEO' in org? True
Is 'Senior Engineer 1' in org? True
Tree height: 3

Improvement: Now all our functions work correctly! The CEO is found, and the structure is much clearer.

What We Gained

Compare the old dictionary approach to the new TreeNode approach:

Old (Dictionary):

org_chart = {
    "CEO": {
        "VP Engineering": {...}
    }
}

New (TreeNode):

ceo = TreeNode("CEO")
vp_eng = TreeNode("VP Engineering")
ceo.add_child(vp_eng)

Iteration 2: Adding Metadata

Let's enhance our TreeNode to store more information:

class TreeNode:
    """Enhanced node with metadata"""

    def __init__(self, name, title=None, hire_date=None):
        self.name = name
        self.title = title
        self.hire_date = hire_date
        self.children = []

    def add_child(self, child_node):
        self.children.append(child_node)

    def __repr__(self):
        return f"TreeNode({self.name}, {self.title})"

# Rebuild with metadata
ceo = TreeNode("Alice Smith", "Chief Executive Officer", "2015-01-01")
vp_eng = TreeNode("Bob Jones", "VP of Engineering", "2016-03-15")
vp_sales = TreeNode("Carol White", "VP of Sales", "2016-06-01")

ceo.add_child(vp_eng)
ceo.add_child(vp_sales)

eng_mgr_a = TreeNode("Dave Brown", "Engineering Manager", "2017-02-01")
eng_mgr_b = TreeNode("Eve Davis", "Engineering Manager", "2018-01-15")

vp_eng.add_child(eng_mgr_a)
vp_eng.add_child(eng_mgr_b)

# Now we can write richer queries
def find_by_title(node, target_title):
    """Find all employees with a given title"""
    results = []

    if node.title == target_title:
        results.append(node.name)

    for child in node.children:
        results.extend(find_by_title(child, target_title))

    return results

def print_org_chart(node, indent=0):
    """Pretty-print the organization chart"""
    print("  " * indent + f"β”œβ”€ {node.name} ({node.title})")
    for child in node.children:
        print_org_chart(child, indent + 1)

print("Engineering Managers:")
print(find_by_title(ceo, "Engineering Manager"))
print("\nOrganization Chart:")
print_org_chart(ceo)

Python Output:

Engineering Managers:
['Dave Brown', 'Eve Davis']

Organization Chart:
β”œβ”€ Alice Smith (Chief Executive Officer)
  β”œβ”€ Bob Jones (VP of Engineering)
    β”œβ”€ Dave Brown (Engineering Manager)
    β”œβ”€ Eve Davis (Engineering Manager)
  β”œβ”€ Carol White (VP of Sales)

Iteration 3: Adding Parent References

Sometimes we need to traverse upward in the tree (from child to parent). Let's add that capability:

class TreeNode:
    """Node with bidirectional links"""

    def __init__(self, name, title=None):
        self.name = name
        self.title = title
        self.parent = None      # Reference to parent node
        self.children = []

    def add_child(self, child_node):
        self.children.append(child_node)
        child_node.parent = self  # Set the parent reference

    def get_ancestors(self):
        """Get all ancestors from this node to root"""
        ancestors = []
        current = self.parent
        while current is not None:
            ancestors.append(current.name)
            current = current.parent
        return ancestors

    def get_depth(self):
        """Calculate depth (distance from root)"""
        depth = 0
        current = self.parent
        while current is not None:
            depth += 1
            current = current.parent
        return depth

    def __repr__(self):
        return f"TreeNode({self.name})"

# Rebuild with parent references
ceo = TreeNode("Alice Smith", "CEO")
vp_eng = TreeNode("Bob Jones", "VP Engineering")
vp_sales = TreeNode("Carol White", "VP Sales")

ceo.add_child(vp_eng)
ceo.add_child(vp_sales)

eng_mgr_a = TreeNode("Dave Brown", "Engineering Manager")
vp_eng.add_child(eng_mgr_a)

se1 = TreeNode("Frank Miller", "Senior Engineer")
eng_mgr_a.add_child(se1)

# Now we can traverse upward
print(f"{se1.name}'s reporting chain: {se1.get_ancestors()}")
print(f"{se1.name}'s depth: {se1.get_depth()}")
print(f"{vp_eng.name}'s depth: {vp_eng.get_depth()}")
print(f"{ceo.name}'s depth: {ceo.get_depth()}")

Python Output:

Frank Miller's reporting chain: ['Dave Brown', 'Bob Jones', 'Alice Smith']
Frank Miller's depth: 3
Bob Jones's depth: 1
Alice Smith's depth: 0

When to use parent references: - When you need to traverse upward frequently - When calculating paths between nodes - When implementing certain tree algorithms (like lowest common ancestor)

Trade-off: Parent references add memory overhead and complexity (must maintain consistency when modifying the tree).

Binary Tree Representation

Binary Tree Representation

Binary trees deserve their own class since they have a different structure (left/right instead of a list of children):

class BinaryTreeNode:
    """A node in a binary tree"""

    def __init__(self, value):
        self.value = value
        self.left = None    # Left child
        self.right = None   # Right child

    def __repr__(self):
        return f"BinaryTreeNode({self.value})"

# Build a simple binary tree
#       5
#      / \
#     3   8
#    / \
#   1   4

root = BinaryTreeNode(5)
root.left = BinaryTreeNode(3)
root.right = BinaryTreeNode(8)
root.left.left = BinaryTreeNode(1)
root.left.right = BinaryTreeNode(4)

def count_nodes(node):
    """Count all nodes in a binary tree"""
    if node is None:
        return 0
    return 1 + count_nodes(node.left) + count_nodes(node.right)

def find_value(node, target):
    """Find if a value exists in the binary tree"""
    if node is None:
        return False
    if node.value == target:
        return True
    return find_value(node.left, target) or find_value(node.right, target)

def tree_height(node):
    """Calculate height of binary tree"""
    if node is None:
        return -1  # Height of empty tree is -1 by convention

    left_height = tree_height(node.left)
    right_height = tree_height(node.right)

    return 1 + max(left_height, right_height)

print(f"Total nodes: {count_nodes(root)}")
print(f"Contains 4? {find_value(root, 4)}")
print(f"Contains 7? {find_value(root, 7)}")
print(f"Tree height: {tree_height(root)}")

Python Output:

Total nodes: 5
Contains 4? True
Contains 7? False
Tree height: 2

Binary Tree vs General Tree: Code Comparison

Notice the pattern differences:

General Tree (n-ary):

def process_tree(node):
    # Process this node
    result = process(node.value)

    # Recurse on ALL children
    for child in node.children:
        result = combine(result, process_tree(child))

    return result

Binary Tree:

def process_tree(node):
    if node is None:
        return base_case

    # Process this node
    result = process(node.value)

    # Recurse on EXACTLY TWO children
    left_result = process_tree(node.left)
    right_result = process_tree(node.right)

    return combine(result, left_result, right_result)

The binary tree version is more explicit about the two recursive calls, while the general tree version uses a loop to handle variable numbers of children.

Visualizing Tree Execution

Visualizing Tree Execution

Let's trace how recursion actually executes on our tree structures. Understanding the call stack for tree recursion is crucial.

Manual Trace: Counting Nodes

Let's trace count_nodes() on this small tree:

    A
   / \
  B   C
 /
D
# Build the tree
a = BinaryTreeNode("A")
a.left = BinaryTreeNode("B")
a.right = BinaryTreeNode("C")
a.left.left = BinaryTreeNode("D")

def count_nodes_verbose(node, indent=0):
    """Count nodes with execution trace"""
    prefix = "  " * indent

    if node is None:
        print(f"{prefix}count_nodes(None) β†’ 0")
        return 0

    print(f"{prefix}count_nodes({node.value})")
    print(f"{prefix}  ↓ Counting left subtree...")
    left_count = count_nodes_verbose(node.left, indent + 2)

    print(f"{prefix}  ↓ Counting right subtree...")
    right_count = count_nodes_verbose(node.right, indent + 2)

    total = 1 + left_count + right_count
    print(f"{prefix}  ↑ Total for {node.value}: 1 + {left_count} + {right_count} = {total}")

    return total

result = count_nodes_verbose(a)
print(f"\nFinal result: {result}")

Python Output:

count_nodes(A)
  ↓ Counting left subtree...
    count_nodes(B)
      ↓ Counting left subtree...
        count_nodes(D)
          ↓ Counting left subtree...
            count_nodes(None) β†’ 0
          ↓ Counting right subtree...
            count_nodes(None) β†’ 0
          ↑ Total for D: 1 + 0 + 0 = 1
      ↓ Counting right subtree...
        count_nodes(None) β†’ 0
      ↑ Total for B: 1 + 1 + 0 = 2
  ↓ Counting right subtree...
    count_nodes(C)
      ↓ Counting left subtree...
        count_nodes(None) β†’ 0
      ↓ Counting right subtree...
        count_nodes(None) β†’ 0
      ↑ Total for C: 1 + 0 + 0 = 1
  ↑ Total for A: 1 + 2 + 1 = 4

Final result: 4

Execution Trace Analysis

Let's parse this execution section by section:

  1. The recursion pattern:
  2. Process current node
  3. Recurse left (goes deep before returning)
  4. Recurse right (only after left completes)
  5. Combine results

  6. The call stack depth:

  7. Maximum depth: 3 (A β†’ B β†’ D β†’ None)
  8. This equals the tree height + 1
  9. Each level of indentation represents one stack frame

  10. The order of execution:

  11. Depth-first: we go all the way down the left side before touching the right
  12. Post-order: we process children before parents (counts bubble up)

  13. The base case:

  14. None nodes return 0 immediately
  15. These are the leaves of the recursion tree
  16. Without this base case, we'd get AttributeError: 'NoneType' object has no attribute 'left'

Recursion Tree Visualization

The recursion tree (not the data tree!) shows all function calls:

                count_nodes(A)
               /              \
        count_nodes(B)      count_nodes(C)
        /           \         /          \
  count_nodes(D)  count_nodes(None)  count_nodes(None)  count_nodes(None)
   /          \
count_nodes(None) count_nodes(None)

Each node in this recursion tree represents one function call. The leaves are all base cases (None nodes).

Key insight: For a binary tree with n nodes, we make exactly 2n+1 function calls (n for actual nodes, n+1 for None children).

Common Tree Operations: The Essential Toolkit

Common Tree Operations: The Essential Toolkit

Now that we have proper tree structures, let's implement the fundamental operations you'll use repeatedly. These form the foundation for all tree algorithms.

Operation 1: Tree Size (Node Count)

We've seen this, but let's formalize it:

def size(node):
    """
    Count total nodes in tree.

    Base case: Empty tree has size 0
    Recursive case: 1 (this node) + size of all subtrees
    """
    if node is None:
        return 0

    # For binary tree:
    return 1 + size(node.left) + size(node.right)

def size_general(node):
    """Size for general (n-ary) tree"""
    if node is None:
        return 0

    count = 1  # This node
    for child in node.children:
        count += size_general(child)

    return count

Complexity Analysis: - Time: O(n) - visit each node exactly once - Space: O(h) - call stack depth equals tree height h - Recurrence: T(n) = T(left) + T(right) + O(1)

Operation 2: Tree Height (Maximum Depth)

def height(node):
    """
    Calculate height of tree (longest path to leaf).

    Base case: Empty tree has height -1 (by convention)
               Leaf node has height 0
    Recursive case: 1 + max height of subtrees
    """
    if node is None:
        return -1

    left_height = height(node.left)
    right_height = height(node.right)

    return 1 + max(left_height, right_height)

def height_general(node):
    """Height for general tree"""
    if node is None:
        return -1

    if not node.children:  # Leaf node
        return 0

    max_child_height = -1
    for child in node.children:
        child_height = height_general(child)
        max_child_height = max(max_child_height, child_height)

    return 1 + max_child_height

# Test with our tree
#     A
#    / \
#   B   C
#  /
# D

a = BinaryTreeNode("A")
a.left = BinaryTreeNode("B")
a.right = BinaryTreeNode("C")
a.left.left = BinaryTreeNode("D")

print(f"Tree height: {height(a)}")  # Should be 2
print(f"Height of B subtree: {height(a.left)}")  # Should be 1
print(f"Height of C subtree: {height(a.right)}")  # Should be 0

Python Output:

Tree height: 2
Height of B subtree: 1
Height of C subtree: 0

Complexity Analysis: - Time: O(n) - must visit all nodes to find longest path - Space: O(h) - call stack depth - Recurrence: T(n) = T(left) + T(right) + O(1)

Operation 3: Search (Find Value)

def contains(node, target):
    """
    Check if value exists in tree.

    Base case: Empty tree doesn't contain anything
    Recursive case: Check this node, then check all subtrees
    """
    if node is None:
        return False

    if node.value == target:
        return True

    # For binary tree: check both subtrees
    return contains(node.left, target) or contains(node.right, target)

def contains_general(node, target):
    """Search in general tree"""
    if node is None:
        return False

    if node.value == target:
        return True

    # Check all children
    for child in node.children:
        if contains_general(child, target):
            return True

    return False

# Test
print(f"Contains 'B'? {contains(a, 'B')}")  # True
print(f"Contains 'D'? {contains(a, 'D')}")  # True
print(f"Contains 'Z'? {contains(a, 'Z')}")  # False

Python Output:

Contains 'B'? True
Contains 'D'? True
Contains 'Z'? False

Complexity Analysis: - Time: O(n) worst case - might need to check all nodes - Space: O(h) - call stack depth - Best case: O(1) if target is at root - Average case: O(n) for unordered tree

Operation 4: Leaf Count

def count_leaves(node):
    """
    Count leaf nodes (nodes with no children).

    Base case: Empty tree has 0 leaves
    Leaf case: Node with no children is 1 leaf
    Recursive case: Sum leaves in all subtrees
    """
    if node is None:
        return 0

    # Check if this is a leaf
    if node.left is None and node.right is None:
        return 1

    # Not a leaf, count leaves in subtrees
    return count_leaves(node.left) + count_leaves(node.right)

def count_leaves_general(node):
    """Count leaves in general tree"""
    if node is None:
        return 0

    if not node.children:  # Leaf node
        return 1

    leaf_count = 0
    for child in node.children:
        leaf_count += count_leaves_general(child)

    return leaf_count

# Test
print(f"Number of leaves: {count_leaves(a)}")  # C and D are leaves = 2

Python Output:

Number of leaves: 2

Operation 5: Print All Values (Tree Traversal Preview)

def print_all_values(node, indent=0):
    """
    Print all values with indentation showing structure.
    This is a preorder traversal (node before children).
    """
    if node is None:
        return

    print("  " * indent + str(node.value))
    print_all_values(node.left, indent + 1)
    print_all_values(node.right, indent + 1)

def print_all_values_general(node, indent=0):
    """Print all values in general tree"""
    if node is None:
        return

    print("  " * indent + str(node.value))
    for child in node.children:
        print_all_values_general(child, indent + 1)

print("Tree structure:")
print_all_values(a)

Python Output:

Tree structure:
A
  B
    D
  C

This gives us a visual representation of the tree structure through indentation.

Lab: Build a Simple Tree Class

Lab: Build a Simple Tree Class

Now it's your turn to synthesize everything we've learned. You'll build a complete, production-ready tree class with all the operations we've discussed.

Lab Objectives

  1. Create a TreeNode class for general (n-ary) trees
  2. Implement all fundamental operations as methods
  3. Add useful utility methods
  4. Test thoroughly with real data

Starter Code

Here's the skeleton to complete:

class TreeNode:
    """
    A node in a general (n-ary) tree.

    Each node stores a value and maintains a list of children.
    Optionally maintains a parent reference for upward traversal.
    """

    def __init__(self, value, maintain_parent_refs=False):
        """
        Initialize a tree node.

        Args:
            value: The data to store in this node
            maintain_parent_refs: If True, maintain parent pointers
        """
        self.value = value
        self.children = []
        self.parent = None if maintain_parent_refs else None
        self._maintain_parents = maintain_parent_refs

    def add_child(self, child_node):
        """
        Add a child to this node.

        Args:
            child_node: TreeNode to add as child
        """
        self.children.append(child_node)
        if self._maintain_parents:
            child_node.parent = self

    def remove_child(self, child_node):
        """
        Remove a child from this node.

        Args:
            child_node: TreeNode to remove

        Returns:
            True if child was found and removed, False otherwise
        """
        try:
            self.children.remove(child_node)
            if self._maintain_parents:
                child_node.parent = None
            return True
        except ValueError:
            return False

    def size(self):
        """
        Count total nodes in subtree rooted at this node.

        Returns:
            Integer count of nodes
        """
        count = 1  # Count this node
        for child in self.children:
            count += child.size()
        return count

    def height(self):
        """
        Calculate height of subtree rooted at this node.
        Height is the longest path from this node to any leaf.

        Returns:
            Integer height (0 for leaf nodes)
        """
        if not self.children:  # Leaf node
            return 0

        max_child_height = 0
        for child in self.children:
            child_height = child.height()
            max_child_height = max(max_child_height, child_height)

        return 1 + max_child_height

    def depth(self):
        """
        Calculate depth of this node (distance from root).
        Only works if parent references are maintained.

        Returns:
            Integer depth (0 for root)

        Raises:
            RuntimeError if parent references not maintained
        """
        if not self._maintain_parents:
            raise RuntimeError("depth() requires maintain_parent_refs=True")

        depth = 0
        current = self.parent
        while current is not None:
            depth += 1
            current = current.parent
        return depth

    def is_leaf(self):
        """Check if this node is a leaf (has no children)"""
        return len(self.children) == 0

    def is_root(self):
        """
        Check if this node is the root (has no parent).
        Only works if parent references are maintained.
        """
        if not self._maintain_parents:
            raise RuntimeError("is_root() requires maintain_parent_refs=True")
        return self.parent is None

    def count_leaves(self):
        """Count leaf nodes in subtree rooted at this node"""
        if self.is_leaf():
            return 1

        leaf_count = 0
        for child in self.children:
            leaf_count += child.count_leaves()
        return leaf_count

    def contains(self, target_value):
        """
        Check if target value exists in subtree.

        Args:
            target_value: Value to search for

        Returns:
            True if found, False otherwise
        """
        if self.value == target_value:
            return True

        for child in self.children:
            if child.contains(target_value):
                return True

        return False

    def find(self, target_value):
        """
        Find and return the node with target value.

        Args:
            target_value: Value to search for

        Returns:
            TreeNode if found, None otherwise
        """
        if self.value == target_value:
            return self

        for child in self.children:
            result = child.find(target_value)
            if result is not None:
                return result

        return None

    def get_ancestors(self):
        """
        Get list of ancestor values from parent to root.
        Only works if parent references are maintained.

        Returns:
            List of values from parent to root
        """
        if not self._maintain_parents:
            raise RuntimeError("get_ancestors() requires maintain_parent_refs=True")

        ancestors = []
        current = self.parent
        while current is not None:
            ancestors.append(current.value)
            current = current.parent
        return ancestors

    def print_tree(self, indent=0):
        """
        Pretty-print the tree structure.

        Args:
            indent: Current indentation level (used internally)
        """
        print("  " * indent + f"β”œβ”€ {self.value}")
        for child in self.children:
            child.print_tree(indent + 1)

    def __repr__(self):
        """String representation for debugging"""
        return f"TreeNode({self.value})"

    def __str__(self):
        """Human-readable string representation"""
        return str(self.value)

Lab Exercise 1: Build an Organization Chart

Use the TreeNode class to build a complete organization chart:

# Build the organization
ceo = TreeNode("Alice (CEO)", maintain_parent_refs=True)

vp_eng = TreeNode("Bob (VP Engineering)")
vp_sales = TreeNode("Carol (VP Sales)")
vp_ops = TreeNode("David (VP Operations)")

ceo.add_child(vp_eng)
ceo.add_child(vp_sales)
ceo.add_child(vp_ops)

# Engineering department
eng_mgr_a = TreeNode("Eve (Eng Manager)")
eng_mgr_b = TreeNode("Frank (Eng Manager)")
vp_eng.add_child(eng_mgr_a)
vp_eng.add_child(eng_mgr_b)

se1 = TreeNode("Grace (Senior Engineer)")
se2 = TreeNode("Henry (Senior Engineer)")
je1 = TreeNode("Iris (Junior Engineer)")

eng_mgr_a.add_child(se1)
eng_mgr_a.add_child(se2)
eng_mgr_b.add_child(je1)

# Sales department
sales_mgr = TreeNode("Jack (Sales Manager)")
vp_sales.add_child(sales_mgr)

sr1 = TreeNode("Kelly (Sales Rep)")
sr2 = TreeNode("Leo (Sales Rep)")
sales_mgr.add_child(sr1)
sales_mgr.add_child(sr2)

# Operations department
ops_mgr = TreeNode("Mia (Ops Manager)")
vp_ops.add_child(ops_mgr)

# Test all operations
print("=== Organization Chart ===")
ceo.print_tree()

print(f"\n=== Statistics ===")
print(f"Total employees: {ceo.size()}")
print(f"Organization depth: {ceo.height()}")
print(f"Number of individual contributors: {ceo.count_leaves()}")

print(f"\n=== Search Operations ===")
print(f"Is Grace in the org? {ceo.contains('Grace (Senior Engineer)')}")
print(f"Is Zeus in the org? {ceo.contains('Zeus')}")

grace_node = ceo.find("Grace (Senior Engineer)")
if grace_node:
    print(f"\nGrace's reporting chain: {grace_node.get_ancestors()}")
    print(f"Grace's depth in org: {grace_node.depth()}")

print(f"\n=== Department Sizes ===")
print(f"Engineering department size: {vp_eng.size()}")
print(f"Sales department size: {vp_sales.size()}")
print(f"Operations department size: {vp_ops.size()}")

Python Output:

=== Organization Chart ===
β”œβ”€ Alice (CEO)
  β”œβ”€ Bob (VP Engineering)
    β”œβ”€ Eve (Eng Manager)
      β”œβ”€ Grace (Senior Engineer)
      β”œβ”€ Henry (Senior Engineer)
    β”œβ”€ Frank (Eng Manager)
      β”œβ”€ Iris (Junior Engineer)
  β”œβ”€ Carol (VP Sales)
    β”œβ”€ Jack (Sales Manager)
      β”œβ”€ Kelly (Sales Rep)
      β”œβ”€ Leo (Sales Rep)
  β”œβ”€ David (VP Operations)
    β”œβ”€ Mia (Ops Manager)

=== Statistics ===
Total employees: 13
Organization depth: 3
Number of individual contributors: 6

=== Search Operations ===
Is Grace in the org? True
Is Zeus in the org? False

Grace's reporting chain: ['Eve (Eng Manager)', 'Bob (VP Engineering)', 'Alice (CEO)']
Grace's depth in org: 3

=== Department Sizes ===
Engineering department size: 6
Sales department size: 4
Operations department size: 2

Lab Exercise 2: File System Tree

Build a file system representation:

# Build a file system tree
root = TreeNode("/")

home = TreeNode("home")
root.add_child(home)

user = TreeNode("user")
home.add_child(user)

documents = TreeNode("documents")
downloads = TreeNode("downloads")
user.add_child(documents)
user.add_child(downloads)

doc1 = TreeNode("report.pdf")
doc2 = TreeNode("notes.txt")
documents.add_child(doc1)
documents.add_child(doc2)

file1 = TreeNode("image.jpg")
file2 = TreeNode("video.mp4")
downloads.add_child(file1)
downloads.add_child(file2)

etc = TreeNode("etc")
root.add_child(etc)

config = TreeNode("config.ini")
etc.add_child(config)

print("=== File System Structure ===")
root.print_tree()

print(f"\n=== File System Statistics ===")
print(f"Total items: {root.size()}")
print(f"Directory depth: {root.height()}")
print(f"Number of files (leaves): {root.count_leaves()}")

print(f"\n=== Search Operations ===")
print(f"Contains 'report.pdf'? {root.contains('report.pdf')}")
print(f"Contains 'missing.txt'? {root.contains('missing.txt')}")

print(f"\n=== Directory Sizes ===")
print(f"Items in /home/user: {user.size()}")
print(f"Items in /home/user/documents: {documents.size()}")

Python Output:

=== File System Structure ===
β”œβ”€ /
  β”œβ”€ home
    β”œβ”€ user
      β”œβ”€ documents
        β”œβ”€ report.pdf
        β”œβ”€ notes.txt
      β”œβ”€ downloads
        β”œβ”€ image.jpg
        β”œβ”€ video.mp4
  β”œβ”€ etc
    β”œβ”€ config.ini

=== File System Statistics ===
Total items: 11
Directory depth: 4
Number of files (leaves): 5

=== Search Operations ===
Contains 'report.pdf'? True
Contains 'missing.txt'? False

=== Directory Sizes ===
Items in /home/user: 7
Items in /home/user/documents: 3

Lab Exercise 3: Challenge Problems

Now try these on your own:

  1. Add a get_path() method that returns the full path from root to a node (e.g., "/home/user/documents/report.pdf")

  2. Add a find_all() method that returns a list of ALL nodes matching a value (not just the first one)

  3. Add a level_order_values() method that returns values level by level (breadth-first)

  4. Add a max_value() method that finds the maximum value in the tree (assuming numeric values)

  5. Add a prune() method that removes all leaf nodes with a specific value

Here's a template for the challenge methods:

class TreeNode:
    # ... (previous methods) ...

    def get_path(self):
        """
        Get the full path from root to this node.
        Requires parent references.

        Returns:
            String path like "/home/user/documents"
        """
        if not self._maintain_parents:
            raise RuntimeError("get_path() requires maintain_parent_refs=True")

        # YOUR CODE HERE
        # Hint: Build path by traversing up to root, then reverse
        pass

    def find_all(self, target_value):
        """
        Find all nodes with target value.

        Args:
            target_value: Value to search for

        Returns:
            List of TreeNode objects
        """
        # YOUR CODE HERE
        # Hint: Accumulate results from this node and all children
        pass

    def level_order_values(self):
        """
        Get values level by level (breadth-first).

        Returns:
            List of lists, where each inner list is one level
        """
        # YOUR CODE HERE
        # Hint: Use a queue to track nodes at each level
        pass

    def max_value(self):
        """
        Find maximum value in tree (assumes numeric values).

        Returns:
            Maximum value found
        """
        # YOUR CODE HERE
        # Hint: Compare this node's value with max of all children
        pass

    def prune(self, target_value):
        """
        Remove all leaf nodes with target value.

        Args:
            target_value: Value of leaves to remove

        Returns:
            Number of nodes removed
        """
        # YOUR CODE HERE
        # Hint: Recursively prune children first, then check if this became a leaf
        pass

# Test your implementations here

Solutions to Challenge Problems

Here are reference implementations:

class TreeNode:
    # ... (previous methods) ...

    def get_path(self):
        """Get full path from root to this node"""
        if not self._maintain_parents:
            raise RuntimeError("get_path() requires maintain_parent_refs=True")

        path_parts = [str(self.value)]
        current = self.parent

        while current is not None:
            path_parts.append(str(current.value))
            current = current.parent

        path_parts.reverse()
        return "/" + "/".join(path_parts)

    def find_all(self, target_value):
        """Find all nodes with target value"""
        results = []

        if self.value == target_value:
            results.append(self)

        for child in self.children:
            results.extend(child.find_all(target_value))

        return results

    def level_order_values(self):
        """Get values level by level"""
        if not self:
            return []

        result = []
        current_level = [self]

        while current_level:
            level_values = [node.value for node in current_level]
            result.append(level_values)

            next_level = []
            for node in current_level:
                next_level.extend(node.children)

            current_level = next_level

        return result

    def max_value(self):
        """Find maximum value (assumes numeric values)"""
        max_val = self.value

        for child in self.children:
            child_max = child.max_value()
            max_val = max(max_val, child_max)

        return max_val

    def prune(self, target_value):
        """Remove all leaf nodes with target value"""
        removed_count = 0

        # First, recursively prune all children
        for child in self.children[:]:  # Copy list to avoid modification during iteration
            removed_count += child.prune(target_value)

        # Then remove children that became leaves with target value
        for child in self.children[:]:
            if child.is_leaf() and child.value == target_value:
                self.remove_child(child)
                removed_count += 1

        return removed_count

# Test the solutions
test_tree = TreeNode(10)
test_tree.add_child(TreeNode(5))
test_tree.add_child(TreeNode(15))
test_tree.children[0].add_child(TreeNode(3))
test_tree.children[0].add_child(TreeNode(7))
test_tree.children[1].add_child(TreeNode(12))
test_tree.children[1].add_child(TreeNode(20))

print("Level order:", test_tree.level_order_values())
print("Max value:", test_tree.max_value())

# Test find_all with duplicate values
dup_tree = TreeNode(5)
dup_tree.add_child(TreeNode(3))
dup_tree.add_child(TreeNode(5))
dup_tree.children[0].add_child(TreeNode(5))

print(f"Found {len(dup_tree.find_all(5))} nodes with value 5")

Python Output:

Level order: [[10], [5, 15], [3, 7, 12, 20]]
Max value: 20
Found 3 nodes with value 5

The Journey: From Problem to Solution

The Journey: From Problem to Solution

Let's review how we progressed from a problematic dictionary representation to a robust tree class:

Iteration Problem Solution Applied Result
0 Nested dicts, root not found None Awkward, buggy
1 Need proper structure TreeNode class with children Clean, works correctly
2 Need metadata storage Add attributes to TreeNode Flexible data storage
3 Need upward traversal Add parent references Bidirectional navigation
4 Need complete operations toolkit Implement all standard methods Production-ready class

Final Implementation Summary

Our complete TreeNode class provides:

Core Structure: - Value storage - Children list (for n-ary trees) - Optional parent references

Fundamental Operations (O(n) time, O(h) space): - size() - count all nodes - height() - longest path to leaf - depth() - distance from root - contains() - search for value - find() - return node with value - count_leaves() - count leaf nodes

Utility Methods: - is_leaf() - check if node is leaf - is_root() - check if node is root - get_ancestors() - path to root - print_tree() - visual representation

Advanced Operations (challenge problems): - get_path() - full path string - find_all() - find all matching nodes - level_order_values() - breadth-first traversal - max_value() - find maximum - prune() - remove matching leaves

Complexity Analysis

Time Complexity: - Most operations: O(n) where n = number of nodes - Must visit each node once in worst case - No operation is faster than O(h) where h = height

Space Complexity: - Call stack depth: O(h) for recursive operations - For balanced trees: h = O(log n) - For degenerate trees (linear): h = O(n)

Recurrence Relations: - Size: T(n) = T(c₁) + T(cβ‚‚) + ... + T(cβ‚–) + O(1) - Height: T(n) = max(T(c₁), T(cβ‚‚), ..., T(cβ‚–)) + O(1) - Search: T(n) = T(c₁) + T(cβ‚‚) + ... + T(cβ‚–) + O(1)

Where c₁, cβ‚‚, ..., cβ‚– are the children of the current node.

When to Use This Tree Structure

Choose general (n-ary) trees when: - Modeling real-world hierarchies (org charts, file systems) - Number of children varies naturally - Flexibility is more important than algorithmic constraints

Choose binary trees when: - Implementing search trees (BST, AVL, Red-Black) - Building heaps for priority queues - Representing mathematical expressions - The two-child constraint provides algorithmic advantages

Key Lessons Learned

  1. Separate data from structure: Don't conflate the node's value with its relationships

  2. Make the root explicit: The root should be a first-class object, not implicit in the container

  3. Choose your invariants: Decide whether to maintain parent references based on your use case

  4. Recursion is natural for trees: Every tree operation follows the pattern "process this node, recurse on children"

  5. Base cases are crucial: Always handle empty trees (None) and leaf nodes explicitly

  6. Visualization helps debugging: The print_tree() method is invaluable for understanding structure

  7. Test with real data: Organization charts and file systems are excellent test cases because they expose edge cases naturally

Looking Ahead: Tree Traversals

Looking Ahead: Tree Traversals

We've built the foundationβ€”a solid tree structure with basic operations. But we've only scratched the surface of what we can do with trees.

In the next chapter, we'll explore tree traversals: systematic ways to visit every node in a tree. You've already seen hints of this:

But there are many more traversal patterns, each suited to different problems:

Depth-First Traversals: - Preorder: Process node before children (what we've been doing) - Postorder: Process children before node (useful for deletion, evaluation) - Inorder: Only for binary trees (used in BSTs for sorted output)

Breadth-First Traversal: - Level-order: Process all nodes at depth d before depth d+1

Each traversal pattern reveals different properties of the tree and enables different algorithms. Understanding when to use each traversal is key to mastering tree algorithms.

Preview: Why Traversals Matter

Consider these problems:

  1. Deleting a tree: Must delete children before parent (postorder)
  2. Evaluating an expression tree: Must evaluate operands before operator (postorder)
  3. Printing a sorted BST: Must visit left subtree, then node, then right subtree (inorder)
  4. Finding shortest path: Must explore level by level (breadth-first)

The traversal pattern you choose determines whether your algorithm is elegant or awkward, efficient or wasteful.

What You've Mastered

Before moving on, make sure you can:

βœ“ Explain the difference between general and binary trees βœ“ Define: node, edge, root, parent, child, leaf, ancestor, descendant, depth, height βœ“ Implement a TreeNode class with all fundamental operations βœ“ Trace recursive execution on tree structures βœ“ Calculate time and space complexity for tree operations βœ“ Choose appropriate tree structures for different problems

If you can do all of these, you're ready for tree traversals. If not, review the sections above and work through the lab exercises again.

The journey continues in Chapter 18: Tree Traversals.